Add per-subnet attestation aggregate count coverage to metrics#406
Conversation
Greptile SummaryThis PR closes issue #398 by populating the previously reserved
Confidence Score: 5/5Safe to merge — the change is purely additive, emitting new per-subnet gauge series without touching the existing combined total or any caller signatures. The new code is a tight, self-contained addition inside cov_record. The vid % cc bucketing formula is already established and documented at the module level, and the zero-count-always-emitted behavior prevents missing-series artifacts in dashboards. The combined gauge is untouched, so no existing query or alert is affected. Documentation and description are consistent with the implementation. No files require special attention.
|
| Filename | Overview |
|---|---|
| crates/blockchain/src/coverage.rs | Adds per-subnet gauge emission in cov_record using vid % cc bucketing; formula is consistent with cov_add and the module-level comment about gossip subnet assignment. Zero-count subnets are always emitted, preventing missing-series gaps in dashboards. |
| crates/blockchain/src/metrics.rs | Description-only update to lean_attestation_aggregate_coverage_validators; removes the stale 'not yet populated' caveat and documents both label values. No functional change. |
| docs/metrics.md | Labels column updated to show subnet=combined,subnet_0,…,subnet_N-1; accurate and consistent with the code change. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[cov_record called\nsection, seen, has_subnet] --> B[emit subnet=combined\ncount of all seen validators]
B --> C{cc = has_subnet.len\ncc > 0?}
C -- No --> E[emit lean_attestation_aggregate_coverage_subnets]
C -- Yes --> D[build subnet_counts vec of length cc\nfor each seen vid: subnet_counts at vid mod cc += 1]
D --> F[for each i in 0..cc\nemit subnet=subnet_i gauge]
F --> E
Reviews (1): Last reviewed commit: "Add per-subnet attestation aggreagte cou..." | Re-trigger Greptile
…ction (#414) ## 🗒️ Description / Motivation Ports the five `lean_block_proposal_*` observability metrics from [leanSpec PR #753](leanEthereum/leanSpec#753) into the ethlambda block builder. These metrics give cross-client visibility into the block-proposal attestation-selection path (`build_block`): how long each phase takes, how many proposal builds run, how many child payloads are greedily consumed, and how many distinct `AttestationData` / aggregated proofs end up in the proposed block. They align with [zeam #914](blockblaz/zeam#914 `getProposalAttestations` instrumentation and the leanSpec naming so the [leanMetrics](https://github.com/leanEthereum/leanMetrics) dashboards work across clients. ## What Changed **`crates/blockchain/src/metrics.rs`** — five new metrics registered with the existing `LazyLock` + `register_*!` pattern, a `BLOCK_PROPOSAL_ATTESTATION_BUILD_PHASES` label constant, registration in `init()`, public API functions, and a unit test: | Metric | Type | Buckets / Labels | |--------|------|------------------| | `lean_block_proposal_attestation_build_phase_seconds` | HistogramVec | `phase` = `select_payloads`, `compact`, `stf_simulate`; buckets `0.001…8` | | `lean_block_proposal_attestation_builds_total` | Counter | one per proposal attempt | | `lean_block_proposal_child_payloads_consumed_total` | Counter | greedily-picked proofs before compaction | | `lean_block_proposal_attestation_data_selected` | Histogram | buckets `0, 1, 2, 4, 8, 16, 32` | | `lean_block_proposal_aggregates_selected` | Histogram | buckets `0, 1, 2, 4, 8, 16, 32, 64, 128` | **`crates/blockchain/src/block_builder.rs`** — instruments `build_block`: times the `select_attestations`, `compact_attestations`, and STF (`process_slots` + `process_block`) phases, and emits the counters/histograms after a successful build. **`docs/metrics.md`** — documents all five in the Block Production Metrics table. ## Correctness / Behavior Guarantees - **No behavior change.** Only metric observations were added around existing logic; block contents, selection order, and the state-transition path are untouched. - **Architectural divergence from leanSpec, documented.** leanSpec re-runs the STF inside a fixed-point loop and observes `stf_simulate` per round. ethlambda projects justification/finalization incrementally during selection and runs the STF exactly **once** at the end, so its `stf_simulate` is a single observation per build. This is noted on the `BLOCK_PROPOSAL_ATTESTATION_BUILD_PHASES` doc comment, consistent with the upstream PR's own caveat that phase timings are not directly comparable across clients. - `attestation_data_selected` and `aggregates_selected` are observed from the post-compaction body (one merged proof per distinct `AttestationData`), matching the spec's intent. - Metrics are only emitted on a **successful** build; a build that errors out in the STF is already counted by the existing `lean_block_building_failures_total`. ## Tests Added / Run - New unit test `metrics::tests::block_proposal_attestation_build_metrics_are_usable` — verifies the phase metric registers and accepts every label in `BLOCK_PROPOSAL_ATTESTATION_BUILD_PHASES`, and that the companion counters/histograms are callable. Guards against drift between the label constant and the strings passed at the `build_block` call sites. - Existing `block_builder` tests (`build_block_*`, `compact_attestations_*`, `extend_proofs_greedily_*`) now exercise the new metric paths and pass unchanged. Commands run: - `make fmt` — clean - `cargo clippy -p ethlambda-blockchain --all-targets -- -D warnings` — clean - `cargo test -p ethlambda-blockchain --lib` — 23 passing ## Related Issues / PRs - Ports [leanEthereum/leanSpec#753](leanEthereum/leanSpec#753) - Related to [blockblaz/zeam#914](blockblaz/zeam#914) - Follows the metrics pattern from #406 (per-subnet attestation aggregate coverage) ## ✅ Verification Checklist - [x] Ran `make fmt` — clean - [x] Ran `make lint` (clippy with `-D warnings`) — clean - [ ] Ran `cargo test --workspace --release` — only `ethlambda-blockchain` lib suite run (23 passing); full workspace release run not yet executed
🗒️ Description / Motivation
lean_attestation_aggregate_coverage_validatorsis registered with[section, subnet]labels, but only thesubnet=combinedtotal was emitted, while the per-subnet series was reserved in the metric definition and left empty. This PR populatessubnet=subnet_Nfor each subnet alongside the existing combined total, surfacing per-subnet validator coverage in dashboards/alerts.What Changed
crates/blockchain/src/coverage.rs—cov_recordnow derives a per-subnet validator count from theseenbitmap (vid % committee_countbucketing, same formula as the existing gossip subnet assignment) and emits onesubnet=subnet_<i>gauge per subnet. Removed the resolvedTODO(#398)comment.crates/blockchain/src/metrics.rs— updated thelean_attestation_aggregate_coverage_validatorsdescription to reflect that both label values are now populated.docs/metrics.md— updated the labels column for the same metric.Correctness / Behavior Guarantees
cov_addis unchanged; the new per-subnet counts are derived at emit time insidecov_record. No signature changes, no caller updates.subnet=subnet_<i>series is additive: the existingsubnet=combinedvalue is unchanged, so any pre-existing query/alert keeps working.cov_recordemits per-subnet values for everyi ∈ [0, committee_count), including zero counts — so dashboards see a flat-zero series rather than missing samples when a subnet has no coverage.lean_attestation_aggregate_coverage_subnets(separate metric, counts how many subnets had any coverage) is unchanged.Cardinality
Series =
sections × (subnets + 1)per the issue. With 6 sections (timely,late,block,combined,agg_start_new,proposal_combined) and the typical configuredattestation_committee_count(1–4 on current devnets), the metric tops out at ~30 series. Bounded byATTESTATION_COMMITTEE_COUNT.Tests Added / Run
None added. Verified manually by running a local devnet and confirming
/metricsexposes onesubnet=subnet_Nseries per configured subnet for every section, with the combined-vs-sum-of-subnets invariant holding.Related Issues / PRs
✅ Verification Checklist
make fmt— cleanmake lint(clippy with-D warnings) — cleancargo test --workspace --release— all passing/metricsendpoint's response